SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation

نویسندگان

  • Wei Shen
  • Shuai Le
  • Yan Li
  • Fuquan Hu
چکیده

FASTA and FASTQ are basic and ubiquitous formats for storing nucleotide and protein sequences. Common manipulations of FASTA/Q file include converting, searching, filtering, deduplication, splitting, shuffling, and sampling. Existing tools only implement some of these manipulations, and not particularly efficiently, and some are only available for certain operating systems. Furthermore, the complicated installation process of required packages and running environments can render these programs less user friendly. This paper describes a cross-platform ultrafast comprehensive toolkit for FASTA/Q processing. SeqKit provides executable binary files for all major operating systems, including Windows, Linux, and Mac OSX, and can be directly used without any dependencies or pre-configurations. SeqKit demonstrates competitive performance in execution time and memory usage compared to similar tools. The efficiency and usability of SeqKit enable researchers to rapidly accomplish common FASTA/Q file manipulations. SeqKit is open source and available on Github at https://github.com/shenwei356/seqkit.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Software News and Updates A Toolkit to Assist ONIOM Calculations

A general procedure for quantum mechanics and molecular mechanics (QM/MM) studies on biochemical systems is outlined, and a collection of PERL scripts to facilitate ONIOM-type QM/MM calculations is described. This toolkit is designed to assist in the different stages of an ONIOM QM/MM study of biomolecules, including input file preparation and checking, job monitoring, production calculations, ...

متن کامل

DendroPy: a Python library for phylogenetic computing

UNLABELLED DendroPy is a cross-platform library for the Python programming language that provides for object-oriented reading, writing, simulation and manipulation of phylogenetic data, with an emphasis on phylogenetic tree operations. DendroPy uses a splits-hash mapping to perform rapid calculations of tree distances, similarities and shape under various metrics. It contains rich simulation ro...

متن کامل

MRT dump file manipulation toolkit (MDFMT) - version 0.1

The MRT routing information export format represents an effective way of storing BGP routing information in binary dump files. Although a few tools exist to extract data from MRT dump files, most of them do not allow repacking or creating such MRT files. The MRT dump file manipulation toolkit (MDFMT) allows to repack parts of large MRT dump files containing BGP update messages into smaller ones...

متن کامل

MFPPI – Multi FASTA ProtParam Interface

Physico-chemical properties reflect the functional and structural characteristics of a protein. The comparative study of the physicochemical properties is important to know role of a protein in exploring its molecular evolution. A number of online and offline tools are available for calculating the physico-chemical properties of a single protein sequence. However, a tool is not available for a ...

متن کامل

Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit

BACKGROUND Scripting languages such as Python are ideally suited to common programming tasks in cheminformatics such as data analysis and parsing information from files. However, for reasons of efficiency, cheminformatics toolkits such as the OpenBabel toolkit are often implemented in compiled languages such as C++. We describe Pybel, a Python module that provides access to the OpenBabel toolki...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2016